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1 .  Introduction 

I  would  like  to  discuss  some  of  the  work  that  has  been 
done  in  designing  experiments  involving  response  functions  non¬ 
linear  in  at  least  one  of  the  parameters.  Formally,  this 
excludes  the  large  volume  of  work  on  the  planning  of  factorial 
experiments  and  on  the  estimation  of  multiple  regressions, 
including  polynomial  response  functions,  although  there  are 
many  similarities  in  both  the  methods  of  attack  and  the  results 
obtained  in  the  linear  and  non-linear  situations. 

This  is  not  an  area  to  which  Fisher  devoted  a  great  deal 
of  attention.  But  the  first  design  problem  for  which  he 
published  a  solution  was  non-linear.  This  was  in  his  1922 
paper  on  the  mathematical  foundations  of  theoretical  statistics, 
before  he  had  published  anything  on  either  the  analysis  of 
variance  or  on  randomization  and  the  design  of  agricultural 
experiments.  The  problem  is  the  estimation  of  the  density  of 
small  organisms  in  a  liquid  by  means  of  a  series  of  dilutions. 
This  problem  forms  a  convenient  introduction.  It  is  a  one- 
parameter  problem,  yet  illustrates  some  of  the  basic  features 
of  non-linear  problems. 

In  this  review  I  shall  try  to  concentrate  on  the  issues 
that  obviously  present  themselves,  the  methods  of  attack 
adopted,  the  progress  made  thus  far,  and  some  problems  still 
awaiting,  so  far  as  I  know,  published  research.  The  area  is 
an  exciting  one.  On  the  technical  side,  a  high  degree  of  both 
mathematical  and  computing  skill  is  required  in  the  more  complex 
problems.  On  the  practical  side,  there  is  the  mportant 
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question:  is  the  research  producing  the  kinds  of  results  that 

assist  the  investigator  in  what  he  regards  as  his  main  problems? 
Equally  important  and  by  no  means  easy,  are  we  able  to  explain 
the  methods  in  terms  that  the  experimenter  can  understand  and 
use? 


2 .  Dilution  series  experiments 


A  volume  V  of  a  liquid  contains  N  tiny  organisms,  thoroughly 
mixed  and  with  no  tendency  to  clumping  or  mutual  rejection.  A 
small  volume  x  is  taken  out.  The  probability  that  this  volume 
contains  no  organisms  is 


P 


-N  x/V 
e 


-ex 


Here  6,  the  density  per  unit  volume,  is  the  parameter  to  be 
estimated,  while  x  corresponds  to  the  level  of  a  factor  which 
can  be  chosen  by  the  experimenter.  In  practice  a  standard 
volume  is  taken  out  by  pipette,  a  desired  x  being  obtained  by 
diluting  the  original  volume  with  pure  water.  The  lab  test  can 
detect  only  whether  the  sample  is  sterile  (contains  no  organisms) 
or  fertile  (contains  one  or  more  organisms). 

If  n  samples  are  drawn  for  given  x,  the  probability  that 
s  are  sterile  is  the  binomial 


n ! 

s ! (n-s ) ! 


pSQn-S 


The  criterion  which  Fisher  selected  can  be  described  in 
two  equivalent  ways.  One  is  that  he  minimized  the  large-sample 
formula  for  the  coefficient  of  variation  or  the  maximum  likeli¬ 
hood  (ML)  estimate  of  0.  Fisher  himself  described  it  as 
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2 

maximizing  the  sample  information  about  log  6  =  0  1(0),  where 

I ( log  0)  =  n(0x)2/(e0x-l)  .  (2.1) 

He  regarded  this  criterion  as  the  natural  one  in  small  as  well 
as  large  samples,  since  he  used  the  phrase  "without  any  large- 
sample  approximation"  in  referring  to  it. 

To  maximize  I (log  0)  in  (2.1),  the  quantity  0x  should  be 
set  at  1.59,  giving  P=0.20.  To  find  x  such  that  x0  =  1.59,  we 
need  to  know  0.  This  is  a  standard  feature  that  distinguishes 
non-linear  from  linear  problems.  In  a  non-linear  problem,  the 
statistician  can  say  to  the  experimenter:  "You  tell  me  the 
value  of  0  and  I  promise  to  design  the  best  experiment  for 
estimating  0".  If  the  experimenter  replies,  "Who  needs  you?", 
this  is  natural  but  not  helpful. 

What  can  be  done  in  practice?  Three  possibilities  suggest 
themselves.  With  a  good  initial  estimate  0Q  of  0,  the  experi¬ 
menter  can  use  Fisher's  solution,  setting  x  =  1.59/0q,  and 
assuming  that  he  has  a  good  if  not  an  optimum  experiment.  In 
Fisher's  problem  the  value  of  0  is  usually  known  poorly  — 
perhaps  within  limits  0^,  0^  whose  ratio  is  100  or  1,000  to  1. 
The  natural  first  question  here  is:  can  the  experiment  be  done 
sequentially?  The  first  experiment  has  x  =1.59/0q,  where  0Q 
is  perhaps  a  poor  first  guess.  The  second  experiment  has 
x  =  1.59/§^,  where  0^  is  the  M.L.  estimate  of  0  from  the  first 
experiment,  and  so  on,  creeping  up  on  the  best  0x. 

So  far  as  I  know,  dilution  series  experiments  are  routinely 
done  non-sequentially  in  a  single  operation.  If  0  is  thought 
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to  lie  between  8^  and  6^«  Fisher's  approach  was  not  to  optimize 
anything,  but  to  try  to  guarantee  a  specified  expected  value 
of  I(log  8).  In  a  series  of  two-fold  dilutions,  for  example, 
the  percentage  of  the  total  information  supplied  by  different 
dilutions  is  shown  in  Table  1. 

Table  1.  IClog  8)  in  percents  at  different  levels  of  8x 


8x 

is 

4 

2  1 

1/2 

1/4 

1/8 

£1/16 

I(%> 

0.9 

12.6 

26.4  24.5 

16.2 

9.3 

4.9 

5.2 

The  five  dilutions  from  8x*4  to  8x=l/4  provide  89%  of  the  total 
information.  To  ensure  that  these  dilutions  are  covered,  we 

Mant  *min9H  ±  1/4  and  xmax9L  i  4  •  This  «ives  Wmin^W 
With  8h/8l*100,  twelve  two-fold  dilutions  suffice  to  cover  this 

range,  and  15  when  8H/8L  =  1,000. 

The  Rothamsted  laboratory  which  brought  the  problem  to 

Fisher  did  38  dilution  series  daily,  and  he  observed  that  daily 

calculation  of  the  38  M.L.  estimates  would  be  "exceedingly 

laborious".  Estimating  8  by  the  method  of  moments  (equating 

the  observed  total  number  of  sterile  plates  to  the  expected 

number)  can  be  done  in  less  than  5  minutes  per  series  by  a 

table  which  he  provided,  now  Table  VIII2  in  Fisher  and  Yates. 

Further,  he  showed  in  1922  that  the  method  of  moments  has  an 

asymptotic  efficiency  of  88%.  Thus,  although  one  of  the 

principal  points  in  his  1922  paper  was  the  superiority  of  M.L. 

over  moments,  he  recommends  moments  for  this  problem  for  what 

seemed  to  him  sound  practical  reasons. 
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The  dilution  series  example  reveals  four  types  of  problems 
that  recur  throughout  non-linear  experiments.  (1)  setting-up 
one  or  more  criteria  by  which  to  judge  alternative  proposed 
designs.  Often,  much  weight  will  be  given  to  getting  good 
estimates  of  the  parameters,  (2)  deciding  how  to  proceed  when 
initial  estimates  of  the  parameters  are  dubious.  The  relative 
feasibility,  cost,  and  performance  of  sequential  and  non¬ 
sequential  methods  become  important  here,  (3)  any  biometrician, 
at  least,  would  insist  with  Fisher  that  the  experiment  be  capable 

A 

of  providing  its  own  internal  estimate  of  C.V.(Q).  Dilution 
series  can  do  this  if  the  model  is  correct  and  if  large-sample 
formulas  can  be  trusted  in  small  samples  —  a  point  that  could 
stand  more  checking,  (4)  checks  on  the  correctness  of  the  model. 
With  two-fold  dilution,  about  7  dilutions  should  provide  P 
values  between  5  and  95%,  giving  some  data  for  X  and  related 
checks. 

3 .  Other  work  by  Fisher 

Fisher's  remaining  work  on  non-linear  problems  mainly 
involved  using  the  concept  of  amount  of  information  as  helpful 
in  planning  data-collection,  as  illustrated  in  the  last  Chapter 
of  his  book,  The  Design  of  Experiments  (1935).  He  did  much 
work  of  this  kind,  which  I  will  not  describe,  on  the  estimation 
of  linkage  in  humans,  animals  and  plants.  In  plants,  for 
instance,  the  amount  of  linkage  between  two  genes  can  be  esti¬ 
mated  by  forming  a  double  heterozygote  and  either  crossing  it 
with  itself  (selfing)  or  backcrossing  it.  For  estimating  close 
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linkage  from  selfing,  he  showed  that  formation  of  the  double 
heterozygote  parent  in  coupling  (AABB*aabb)  can  be  15  times  as 
efficient  as  its  formation  in  repulsion  (AAbb*aaBB),  and  is 
nearly  as  efficient  as  backcrossing. 

Fisher’s  first  paper  (1923)  on  the  analysis  of  variance, 
dealt  with  a  12x6  factorial  on  potatoes.  He  first  presents  the 
standard  ANOVA  into  main  effects  and  interactions.  He  then 
remarks  that  the  preceding  analysis  is  given  solely  for 
illustration,  since  the  linear  model  is  obviously  unsuitable, 
predicting  negative  expected  yields  for  some  of  the  plots.  As 
more  reasonable,  he  proceeds  to  fit  a  non-linear  product  model, 
which  can  be  written 

E(y . . )  s  u(l+o  )(1+B.)  . 

A  J  *  J 

This  requires  more  work  but  as  anticipated  fits  better,  the 
S.S.  deviations  being  847  against  981.  From  the  1923  paper,  I 
would  not  have  expected  Fisher's  later  ANOVA  work  to  have  con¬ 
centrated  so  largely  on  development  of  the  linear  model.  I  am 
sorry  that  I  never  asked  him  why. 

4.  Quantal  bioassay  (non-sequential ) 

Another  earlier  ncn-linear  problem  on  which  much  research 
for  practical  experiments  has  been  done  is  quantal  bioassay 
under  a  normal  or  a  logit  tolerance  distribution  —  a  problem 
again  with  a  0-1  response.  We  are  comparing  a  Standard  (S)  with 
a  Test  (T)  preparation  thought  to  contain  the  same  active 
ingredient  and  therefore  to  act  like  a  dilution  or  concentration 
of  the  Standard.  Thus  if  x  is  log  dose,  an  amount  x  of  S  has 
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exactly  the  same  effect  as  an  amount  x-M  of  T.  Here  M,  the 
log  relative  potency  of  Test  to  Standard,  is  the  quantity  to 
be  estimated. 

To  illustrate  from  the  normal  model,  if  n  subjects  are 
given  an  amount  x  of  S,  the  proportion  responding  is  binomial 
with 

/x  >»(x-vjg)/o 

exp{-  i(x-uc)2/o2}dx  =  —  /  Z(t )dt  (4.1) 

.00  /2  TT  y.go 

where  Z(t)  is  the  ordinate  of  the  Standard  normal  curve. 

For  T,  the  formula  differs  only  in  that  Thus  the 

problem  is  a  three-parameter  one,  with  one  parameter  M  to  be 
estimated  and  two  nuisance  parameters. 

For  a  single  agent,  Fisher  showed  that 

I(y)  =  nZ2/PQo2 

which  is  maximized  at  P=0.5,  x  =  y.  Thus  if  Mg,  u-j,  and  therefore 
M  were  known,  the  optimum  experiment  would  place  all  subjects 
at  the  levels  of  S  and  T  causing  50%  response. 

Lacking  this  knowledge,  experimenters  use  2  or  more  levels 
of  each  agent  (hopefully  straddling  the  50%  response)  from  which 
the  M.L.  estimates  of  Ug»  U^.  can  be  obtained. 

If  Y  is  the  normal  deviate  corresponding  to  P  in  (4.1) 


For  a  single  agent,  Fisher  (1935)  and  others  —  see  Finney 
(1947)  --  showed  that  M.L.  estimates  of  y  and  o  could  be  obtained 
iteratively  by  a  weighted  linear  regression  on  x  of  a  transform 
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y  (the  working  transform)  of  the  observed  proportion  p  =  r/n 
of  responding  subjects. 

This  approach  gives  t'io  fitted  lines 


Ys  =  7S  ♦  b(x-xs)  (4.2) 

Y?  =  yT  +  b(x-xT)  .  (4.3) 


A  A  A 

To  obtain  the  s*me  response,  Yg  =  Y,p,  the  difference  M  between 
the  required  doses  Xg  and  x,p  is,  from  (4.2)  and  (4.3), 

M  =  Xg-xT  -  (yg-yT)/b, 


where  1/b  estimates  the  assumed  common  o.  Since  b  is  first 
estimated  separately  for  Test  and  Standard,  a  test  of  signifi¬ 
cance  of  (b^.-bs)  is  available  and  is  regarded  as  an  essential 
check  on  the  basic  assumptions  before  the  combined  estimate  b 
is  made. 

*  _ 

Since  M  involves  the  ratio  (yg-yj)/b  of  two  random 
variables,  Finney's  criterion  (1964)  for  the  choice  of  levels 
of  x,p  and  Xg  and  of  n  is  the  half-width  of  Fieller's  (1940)  5% 
fiducial  interval  for  M,  which  is  found  to  be 


1.96 

RI^gT 


(1-g) 


(M-Xg*xT) 2 


XX 


1 

7 


(4.4) 


—  2  2  2 
where  Sxx  =  Enw(x-x)  summed  over  both  agents  and  g=(1.96)  /b  S 

is  the  square  of  (1.96  times  the  coefficient  of  variation  of  b) 

In  designing  an  experiment,  the  number  of  levels  k,  their 

spacing  d,  and  the  sample  size  n  at  each  level  must  be  chosen. 

From  previous  work  on  the  Standard,  good  initial  estimates  of 


xx 
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8  and  Ug  should  usually  be  available  and  an  initial  estimate  Mq 
is  assumed.  The  strategy  is  to  make  Xj  =  Xg-MQ  at  any  level. 
This  should  make  (M-Xg+Xj)  in  (4.4)  small  and  the  corresponding 
term  in  (4.4)  is  often  negligible.  In  this  event,  with  n 
constant,  (4.4)  becomes 

(1.96)  f  2  1  7  rx 

E -  [sn-gTVj  (4-5) 

where  W  =  Iw  over  the  k  levels  for  one  agent.  Regarding  the 
quantity  multiplying  (1.96)  in  (4.5)  as  a  kind  of  effective 
standard  error  of  M,  Finney  (1964,  496-7)  tabulates  b*Vg(M)  for 
k  =  2,3,4,  total  number  of  subjects  N  =  2kn  =  48,  240,  and  a 
range  of  choices  of  levels  which  give  P  values  centred  about 
50%.  A  similar  table  is  given  for  the  logistic  model  in  which 
logit  P  is  assumed  linear  in  x. 

These  tables  provide  estimated  optimum  spacings  and  the 

o  a 

corresponding  b  Vj.(M)  for  2,3,**,  levels  and  N  =  48,  240. 

Similar  tables  for  other  sample  sizes  and  numbers  of  levels 
could  easily  be  provided. 

The  optimum  levels  assume  good  initial  guesses.  The  only 
work  that  I  have  seen  allowing  poor  guesses  is  by  Brown  (.'966). 
Using  the  simpler  Spearman- Karber  estimates  of  Ug,  U^,  he 
recommends  choices  of  n,  d,  kg  and  k^  (which  he  allows  to 
differ),  in  order  to  give  a  desired  width  of  95%  confidence 
interval  for  H.  This  approach  i6  similar  to  Fisher's  in  the 
dilution  series.  Naturally,  more  levels  are  required  to  ensure 
coverage  of  the  50%  dose:  Brown's  worked  example  gives  V10* 
^•22. 
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Thus,  based  on  the  Fieller  criterion,  available  methods 
furnish 

(1)  a  near-optimum  experiment,  assuming  good  initial 
estimates  of  o,  jj<,  and  M,  and  using  large-sample  theory, 

(2)  assuming  the  model  correct,  Fieller1 s  limits  for  the 

a 

sample  data,  as  a  measure  of  the  precision  of  M, 

(3)  for  more  than  2  levels  per  agent,  tests  of  the 

2 

adequacy  of  the  model.  The  x  for  deviations  from  the 
model  has  (2k-3)d.f.  These  split  into  1  d.f.  for  non- 
parallelism,  1  d.f.  for  combined  curvature,  and  ( 2k—  5 ) 
d.f.  for  other  sources.  Fortunately,  as  Finney  shows, 
k=4  does  not  demand  more  subjects  than  k=2. 

I  know  of  no  intensive  study  of  the  robustness  of  the  pre¬ 
sumed  optima  to  poor  initial  guesses  at  the  parameter  values. 
Extensions  of  Finney's  tables  to  more  spacings  and  more  sample 
sizes  would  reveal  the  effects  of  wrong  spacing,  through  a  bad 
guess  at  a,  on  b  V^fM).  For  r>l,  it  looks  from  his  tables  that 
the  effects  are  more  serious  if  the  guess  is  o/r  than  if  it  is 
ro,  and  more  serious  with  fewer  levels,  as  would  be  expected. 
Sample  si^e  charts  by  Healy  (1950)  indicate  for  k=3  the  effects 
of  wrong  centering  of  the  doses  (through  a  pco’"  guess  at  p<,). 
More  work  on  robustness  and  on  the  small-sample  performance  of 
the  recommended  plans  and  formulas  would  be  useful. 

5 .  Quantal  bioassay  (sequential) 

A  well-known  method,  the  Up  and  Down  or  Staircase  method 
(Dixon  and  Mood,  1948,  Dixon,  1965,  1970),  was  devised  for 
experiments  in  which  it  is  convenient  to  test  subjects  one  at 
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a  time,  determining  the  level  of  the  agent  for  the  next  subject 
after  seeing  the  result  (0  or  1)  for  the  previous  subject.  For 
a  given  dose  spacing  d,  the  rule  for  a  single  agent  (Standard 
or  Test)  is  the  very  simple  one 

xuu  =  vd  (if  yu=0)i  xu.i 5  vd  (if  yu=1)  • 

The  idea  is,  of  course,  to  concentrate  dose  levels  in  the 
neighborhood  of  P,  the  median  of  the  response  y,  which  the 
method  is  designed  to  estimate.  The  nominal  sample  size  N  is 
defined  as  the  number  of  trials,  beginning  with  the  first  pair 

A 

in  which  a  reversal  (0  to  1  or  1  to  0)  occurs.  The  estimate  P 
of  P  is  the  mean  of  the  last  N  values  of  xu,  with  an  adjustment 
(Dixon,  1970)  depending  on  the  numbers  of  0's  and  I's  that  were 

a.  2 

obtained.  The  mean  square  error  of  P  is  approximately  2<*  /N 
when  d  lies  between  the  limits  d  =  20/3  and  d  =  30/ 2,  with 
d  =  o  recommended  as  the  most  accurate  spacing.  This  work  is 
based  on  exact  small-sample  computations. 

A  single  sequence  provides  no  usable  estimate  of  °,  which 
is  undesirable  if  we  wish  to  attach  an  estimated  r.m.s.  error 
/20/<^N  to  P.  Dixon  (197  0)  recommends  that  the  experiment  be 
run  in  independent  sequences  with  N(say)=6  in  each  sequence. 

If  there  are  r  of  these  in  parallel  under  the  same  operating 
conditions,  this  speeds  up  completion  of  the  experiment  and 

A  -  A  /N  O 

allows  V(p )  to  be  estimated  from  MP^-P)  /r(r-l).  Alternatively, 
other  relevant  variables  may  be  changed  from  one  set  to  another, 
permitting  the  effects  of  these  variables  on  p  to  be  investi¬ 
gated  by  analysis  of  variance  techniques. 


12- 


For  a  logistic  model ,  when  a  single  (longer)  sequence  is 
being  used,  Wetherill  (1966)  has  proposed  a  change  intended 

a 

to  make  the  accuracy  of  v  more  robust  against  a  poor  initial 
guess  and  use  of  a  d  too  large.  After  6  changes  of  response 

A  A 

type  have  occurred,  estimate  u,  and  restart  near  M  using  half 
the  original  spacing.  Here  there  remains  the  problem  of  an 
estimate  of  a  from  the  data. 

Another  sequential  plan,  using  the  Robbins-Monro  stochastic 
approximation  process,  attempts  to  do  better  than  the  Up  and  Down 
by  steadily  shortening  the  steps  as  the  sequence  proceeds.  If 
a  group  of  n  subjects  are  tested  at  each  step,  the  level  of  x 
for  the  (u+l)th  experiment  is 


A 

When  the  experiment  is  terminated,  the  estimate  u  is  the  level 
at  which  the  next  experiment  would  have  been  conducted  (Cochran 
and  Davis,  1965).  With  £  steps,  the  asymptotic  formula  for 

A  2 

Vfu )  is  wo  /2ng,  the  value  it  would  have  if  all  trials  could 
be  conducted  at  the  optimum  50%  level.  To  guard  against  a  poor 
initial  guess  at  U,  a  'delayed*  version  was  also  suggested  in 
which  the  step  size  c  remains  unchanged  until  both  deaths  and 
survivals  have  been  obtained.  A  modification  with  a  similar 
purpose  has  been  proposed  by  Kesten  (1958). 

For  srall  experiments  with  N  =  ng  =  12,  where  T  is  the 
number  of  steps  *  3,  4,  6,  or  12,  Davis  (1971)  has  compared  the 
M.S.E.'s  of  u  for  three  versions  of  the  Robbins-Monro,  two  of 
the  Up  and  Down,  and  a  non-sequential  experiment  using  the 
Spearman-Karber  estimate,  for  normal,  logistic,  uniform  and 
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exponential  tolerance  distributions.  This  is  the  first  broad 
comparison  of  the  performances  of  different  plans  in  small 
samples.  It  is  reasurring  that  the  recommended  step  size  and 

a 

the  asymptotic  formulas  for  V(u)  both  perform  well  for  starts 
within  about  1.5o  —  about  all  that  can  be  expected  for  N=12. 

Overall,  delayed  versions  of  the  Up  and  Down  and  the  Robbins- 
Monro  performed  best,  both  easily  beating  the  non-sequential 
methods. 


6 .  Single  continuous-variable  response-a  criterion 


For  the  uth  observation  or  trial  (u=l, 2, . . . ,N)  the  model 
now  becomes 


yu  =  f(^u’-)  +  eu  =  f(*ul* ’ *  ’^uk’  01' 


’V 


+  e  .(6.1) 


Here,  denotes  the  level  at  which  the  value  of  the  variable 

is  set  by  experimenter  in  the  uth  trial.  There  are  k  such 

factors  or  variables,  while  £  is  the  number  of  parameters 

involved  in  the  model.  In  the  simplest  models  the  are  assumed 

2 

independently  N(o,o  ). 

The  paper  that  provided  the  impetus  to  intensive  work  is 
that  of  Box  and  Lucas  (1959).  Much  related  earlier  work, 
dealing  primarily  with  the  linear  case,  had  been  done  by  Kiefer 
(1959),  Elfving  (1952),  and  Chernoff  (1953),  who  considered  the 
choice  of  a  criterion  and  the  finding  of  the  design  points 
(levels  of  the  factors  £^u). 

The  criterion  proposed  by  Box  and  Lucas  assumes  interest 
in  all  the  parameters.  It  maximizes  the  generalization  of 
Fisher's  amount  of  information,  or  equivalently  minimizes  the 
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asymptotic  formula  for  Wilks'  generalized  variance  of  the  M.L. 
estimates  of  the  Qj.  From  (6.1)  the  log  likelihood  is 


N 

Z 

u=l 


(yu-V 


2 


It  follows  that  the  information  matrix  is 


E 


JiUlXinq)* 


(X'X) 


where  X  is  the  Nxp  matrix 


3fu 

TFT 

3 


The  x  .  are  known  when  the  factor  levels  K  .  and  the  9.  are 

U]  ui  j 

known.  The  criterion  -  choose  design  points  to  maximize 
| X'X| -assumes  initial  guesses  0j  for  practical  use.  Other 
attractive  features  of  this  criterion  (summarized  by  M.J.  Box 
and  Draper  (1971))  are  as  follows. 

(1)  It  minimizes  the  volume  of  the  asymptotic  confidence 
region  for  the  (Kiefer,  1961). 

(2)  For  response  functions  locally  linear  in  the 
neighborhood  of  the  M.L.  estimates,  it  maximizes 
the  joint  posterior  probability  of  the  given  a 
non-informative  prior  HdQj.  (Draper  and  Hunter,  1966). 

(3)  It  is  invariant  under  changes  of  scale  of  the  9j. 


7 .  Finding  the  design  points  -  non-sequentially 

Given  a  criterion,  the  next  step  is  the  complex  one  of 
finding  design  points  that  satisfy  the  criterion  for  a  specified 
N  trials.  In  earlier  work,  Chernoff  (1953)  considered  the  case 
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where  our  interest  is  in  s  £  p  of  the  parameters,  the  remaining 
(p-s)  being  nuisance  parameters.  His  criterion  was  different  - 
minimizing  the  average  of  the  asymptotic  variances  of  the  s 
M.L.  estimates.  Following  Elfving  (1952),  he  showed  that  an 
optimum  design  needs  at  most  s(2p-s+l)/2  points,  becoming 
p(p+l)/2  when  s=p,  and  p  when  s=l. 

As  a  start.  Box  and  Lucas  (1959)  assumed  initial  guesses 
and  sought  an  optimum  set  of  levels  when  N=p,  i.e.  when  there 
are  only  as  many  trials  as  parameters  to  be  estimated.  They 
point  out  unappealing  features  of  this  decision:  no  test  of 
the  fit  of  the  model,  no  attempt  at  robustness  against  poor 
initial  guesses,  to  which  might  be  added  no  data  for  an  experi- 
mental  estimate  of  o  . 

One  advantage  with  Nsp  is  that  (X'X)  is  square,  so  that 
I X * X |  =  | X | 2  and  it  suffices  to  maximize  |x|  =  |x  ^|.  Illus¬ 
trative  examples  worked  by  Box  and  Lucas  include  the  exponential 
growth  or  decay  curve,  the  Mitscherlich  equation,  and  the  two- 
factor  function 

fai’C2,,ei’82>  =  exp(-0151e  1  Z)  . 

Depending  on  the  complexity  of  the  problem,  methods 
available  for  solution  are 

1.  Geometric  or  analytic, 

2.  Calculate  I X I  for  a  grid  of  values  of  the  fit  a 

quadratic  to  this  grid  and  seek  a  maximum  (with  trouble 
possible  if  I X I  has  more  than  one  turning  value) 

3.  Various  computer  iterative  hill-climbing  techniques. 
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As  a  simple  example  with  an  analytic  solution,  consider 
the  exponential  decay  curve 


f  =  8  e'#2tu 
u  1 


where  t  (time)  is  used  for  The  region  feasible  for 

experiments  is  t(min)  £  t  <  t(max).  For  this  fu. 


|X|  = 


’02tl 

-9,t 

-w  ‘ 

-0  t. 

-0  t 

2  2 

-t26ie  2 

=  61(t1-t2)e 


-92 (t1+t2) 


This  can  be  written 

-0o(t.-to)  -29_t„ 

|  X|  =  (9  (t,-t0)e  2  1  2  He  2  2}  . 

11*. 

For  given  (t^-t2)  and  e2>o,  we  want  t2=t(min).  The  first 

curly  bracket  is  maximized  when 

t1-t2  =  1/9 2 ,  giving  t1  =  t(min)+l/92 

or  t^  =  t(max),  whichever  is  smaller. 

Coming  to  the  case  of  a  single  non-sequential  experiment 
with  N>p,  Atkinson  and  Hunter  (1968)  found  in  several  chemical 
examples  worked  by  computer  maximizing  that  with  N  a  multiple 
of  p,  the  optimum  plan  consisted  simply  of  N/p  replications  at 
each  of  the  p  optimum  sets  of  levels  for  the  case  N=p.  This 
result  certainly  simplifies  the  finding  of  optimum  plans. 
Although  a  counter  example  showed  that  the  result  does  not  hold 
in  general,  they  proved,  as  a  sufficient  condition,  that  the 
result  will  hold  if  the  region  of  experimentation  lies  within 
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a  certain  ellipsoid  in  the  x-space  (a  point  that  can  be  checked 
by  the  experimenter.) 

M.J.  Box  (1968a, 1970a)  considered  also  the  case:  N  not  a 
multiple  of  p.  In  some  problems  he  found  that  replications  of 
the  N=p  solution  differing  by  at  most  1  could  be  proved  to  be 
optimal.  In  others,  while  this  could  not  be  proved,  a  computer 
search  was  unable  to  locate  anything  superior  to  the  near-equal- 
replication  solution.  He  also  considered  a  one-factor,  two- 
parameter  problem  with  £ul  =  time  =  tu,  where  different  trials 
cost  different  amounts.  The  problem  was  to  maximize  |X:X| 
subject  to  a  fixed  cost  C  =  Icu.  The  optimum  again  consisted 
of  experiments  at  only  two  times  t-,  ,  t2,  but  with  the  difference 
that  ^  and  t2  changed  both  with  N  and  C  and  the  numbers  of 
replications  were  no  longer  near-equal,  so  that  more  computing 
effort  was  necessary. 

The  counter-example  by  Atkinson  and  Hunter  is  the  linear 
fitting  of  a  bivariate  regression,  fu  =  0i^iu+02^2u*  the 

region  of  experimentation  0<,5^  <1.  For  N=p=2,  the  optimum 
design  is  at  the  levels  (1,0)  and  (0,1),  which  gives 


With  N=6,  three  replications  of  this  plan  give 

|X'X|  =•  9  . 


X’X 


e ;)  ■ 


But  two  replications  of  the  three-point  plan  (1,0),  (0,1),  (1,1) 
give 


”■('  !)• 


|X’X|  =  12  . 
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2  2 

The  key  ellipse  in  this  example  is  the  circle  +  ^  =  1»  and 
the  point  (1,1)  in  the  experimental  region  lies  outside  this  circle. 

The  preceding  results  on  the  best  set  of  design  points  are 
conceptually  similar  to  Fisher's  original  optimum  for  the 
dilution  series  problem,  and  assume  in  effect  good  initial  esti¬ 
mates  of  the  With  poor  initial  guesses,  the  resulting  plan 

will  not  be  optimal  in  any  real  sense.  I  have  come  across  no 
work  analogous  to  Fisher's,  where  we  start  with  a  wider 
spread  than  p  points  with  the  object  of  guaranteeing  a  specified 
value  of  | X  *  X |  starting  from  initial  0j  assumed  known  initially 
only  to  lie  within  a  certain  region. 


8 .  Finding  the  design  points  sequentially 

As  would  be  expected,  the  methods  start  with  p  points, 
determined  by  first  guesses  and  leading  to  M.L.  estimates 

/v 

0^^  of  all  the  parameters.  Box  and  Hunter  (1965a)  discuss  how 

to  add  points  one  at  a  time.  If  (N-l)  steps  have  been  completed, 

so  that  0..  ,  .  are  known,  then  |X'X|  as  a  function  of  the  x's 

N-1,d 

for  the  Nth  point  takes  the  form 

.2 


X'X| 


N 


C11+X1N 


C12+X1NX2N 


Clp+XlNXpN 


C12+X1NX2N  C22+X2N 


Clp+XlNXpN 


Cpp+XpN 


where  the  c^j  are  known.  The  criterion  is  computed  for  all 
points  of  a  grid  of  values  of  the  C  and  a  quadratic 
fitted  to  find  the  maximizing  values. 
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M.J.  Box  (1970a)  adds  sequential  sets  of  n=p  points,  each 
put  at  the  best  p  design  points  as  estimated  from  the  M.L.0 
obtained  from  the  combined  trials  conducted  to  date.  After  a 
time,  both  the  M.L.6  and  the  indicated  set  of  p  design  points 
for  the  rth  set  begin  to  change  little  from  those  in  the 
(r-l)th  set.  Box  introduces  a  criterion  as  a  guide  to  the 
time  when  it  is  no  longer  worth  changing  points.  A  second 
quantity  R2  compares  the  |  X  * X |  value  given  by  all  trials  con¬ 
ducted  to  date  with  the  value  that  |  X '  X  |  would  have  if  it  had 
been  possible  to  use  our  current  estimate  of  the  best  design 
points  in  all  trials.  Thus  R?  indicates  the  amount  lost  owing 
to  poor  initial  guesses  at  the  8...  In  the  simulated  example 
(3  parameters,  2  factors),  some  values  of  R^  and  R2  are  as  in 
Table  8.1. 

Table  8.1.  Values  of  R^,  Rj  in  sequential  plan 

Set  2  3  4  5  6 

R1  1.40  1.09  1.06  1.04  1.02 

R2  0.78  0.86  0.86  0.88  0.91 

In  order  to  study  the  effect  on  the  sequential  process  of 
having  initial  prior  information  of  different  amounts  about 
different  6^,  Draper  and  Hunter  (1967a)  took  a  multinormal  prior 

-7P  -7 

( 2ir)  |ft|  exp{-  y(e-eo)'fT1(0-0o)) 

where  fl  is  the  pxp  matrix  of  variances  and  covariances  and  the 
0Q  are  initial  guesses.  In  the  case  where  N  trials  had  already 
been  completed  at  chosen  levels  £  ,  they  discussed  where  to  put 
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a  further  n  trials.  Their  criterion  was  to  maximize  the 
posterior  distribution  of  £  after  (N+n)  trials  with  respect 
both  to  £  and  to  the  values  (u=N+l, . . . ,N+n) .  Assuming 
f(£u,0)  to  be  locally  linear,  this  leads  to  the  approximate 
criterion:  maximize 

|X'X  +  a2fi”1|  (8.1) 

with  respect  to  0^  and  (u=N+l, . . . ,N+n) .  One  hurdle  is  that 
in  (8.1)  the  values  of  0_  are  hidden  in  X'X  and  their  maximizing 
values  after  (N+n)  trials  depend  on  observations  not  yet  taken. 
The  natural  suggestion  is  to  use  0^  in  maximizing  (8.1)  with 
respect  to  the  levels  f^. 

The  principal  value  of  this  type  of  prior  is  likely  to  be 
the  light  it  throws  on  how  the  design  would  be  affected  by 
different  amounts  of  prior  information  about  the  different  0^. 

As  an  illustration  they  work  a  problem  with  indePendent 

normals  (0,cXj),  N=0,  n=2,  and  a  single  £ul  (=tu,  a  time  variable). 
As  op <*2  varY  from  0  to  »,  three  basic  design  types  predominate: 
(t^tj)  =  (1.2, 6. 9)  for  little  prior  information,  (t1,t2)  = 

(1.2,1. 2),  where  the  experiment  concentrates  on  estimating  0 ^ , 
and  ('ti»'t2^  =  (6.9,6. 9),  where  the  emphasis  is  on  Oj*  Further 
illustrations  of  this  type  would  be  of  interest. 

9 .  Tests  of  fit  of  the  model 

As  a  dividend,  the  sequential  approach  might  provide  some 

2 

data  for  a  test  of  fit  of  the  model  (at  least  assuming  o  known), 
since  ^  will  have  been  determined  in  general  at  N>p  design 
points.  If,  however,  the  successive  design  points  vary  over 
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only  a  restricted  part  of  the  experimental  region,  examination 
of  the  residuals  may  tell  us  little.  Experience  with  a  wider 
range  of  non-linear  models  may  throw  more  light  on  this  issue. 

I  know  of  no  work  in  which  an  N>p  experiment  was  deliber¬ 
ately  planned  ab  initio  with  a  test  of  lack  of  fit  as  one 
objective.  Two  suggestions  have  been  made  by  Box  and  Lucas 
(1959). 

First,  having  computed  the  combinations  of  levels  of  the 
£u  needed  to  maximize  |X'X|,  the  experimenter  might  examine 
where  they  occur  in  the  space  of  interest,  and  add  extra 
points  where  he  is  most  worried  that  the  model  may  be  incorrect. 

Secondly,  the  experimenter  may  sometimes  be  reasonably 
sure  that  if  the  model  is  incorrect,  a  more  general  model  with 
say  one  or  two  extra  parameters  gives  an  adequate  fit.  The 
example  cited  is  where  the  model  (with  a  single  £  variable)  is 


f  =  1-e 
u 


-Mu 


where  in  fact  the  more  general  model 

f  -  9l  fa'8^  a'8lS 
fu  -  te  ‘  e  1 


might  be  required.  The  experiment  might  be  planned  to  estimate 
6.^  and  02  and  test  the  N.H.02=O,  which  makes  the  original  model 
correct. 


10 .  Discrimination  between  specified  models 

An  approach  by  D.  R.  Cox  (1961,  1962)  for  discrimination 
between  two  models,  used  a  test  of  significance  and  was  asym¬ 
metric:  the  hypothesis  that  model  1  is  correct  was  chosen  as 


the  null  hypothesis.  The  test  criterion  was  a  modified  form 
of  the  likelihood  ratio*  maximizing  the  asymptotic  power  against 
model  2  as  the  alternative.  Later*  in  planning  an  experiment  to  discrim 
inate  between  the  probit  and  the  logit  models  from  observations 
confined  to  3  log  dosages*  Chambers  and  Cox  (1967)  used  a 
compromise  symmetric  form  of  this  approach.  They  first  chose  a 
criterion  asymptotically  powerful  against  A.H  =  logistic*  given 
NH  -  probit.  For  this  criterion,  they  determined  the  optimum 
three  dosage  levels  and  the  proportions  of  the  observations  to 
be  put  at  each  level.  Then  they  reversed  the  procedure,  having 
A.H.  =  probit*  N.H.  =  logistic.  Fortunately,  the  optimum  doses 
did  not  differ  greatly  in  the  two  cases*  so  that  a  good  com¬ 
promise  design  could  be  constructed.  Unfortunately*  as  Chambers 
and  Cox  note,  this  plan  put  the  majority  of  the  observations  at 
a  high  dose  level  with  expected  percent  killed  over  99.6%.  Thus 
the  experiment  would  require  large  samples*  as  will  not  surprize 
those  who  have  worked  with  both  probits  and  logits.  This 
approach  might  end*  of  course,  by  rejecting  neither  model*  one 
specific  model,  or  both  models. 

An  alternative  approach*  Box  and  Hill  (1967),  is  symmetric 
and  extends  to  more  than  two  specific  models.  For  two  models, 
the  approach  supposes  that  n  observations  have  already  been 
taken  (at  least  enough  to  estimate  any  parameters  involved)  and 
considers  where  best  to  put  the  (n+l)th  for  maximum  discrimination. 

At  first  sight  one  might  be  inclined  to  seek  the  point  (levels  of 
the  factors)  for  which  | I  maximized,  where  Y^  and  Y^  are 
estimated  from  the  results  for  the  first  n  observations.  But 
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as  Box  and  Hill  note,  the  precision  of  estimation  of  lY^Yjl  *s 
also  relevant. 

Prior  probabilities  n^Q  are  first  assigned  to  each  model. 

With  two  models,  the  choice  might  be  n^Q  =  1/2  and  with  m  models, 
niQ  =  1/m.  After  n  runs,  the  posterior  probability  for  the  ith 
model  is 

n.  =  n.  , p *  /  El! •  p  • 

in  i,n-l*i  “i,n-l*i 

where  p^  is  the  probability  density  function  of  the  nth  observa¬ 
tion  y  under  model  i. 
n 

The  criterion  chosen  for  discrimination  uses  Shannon's 

(1948)  concept  of  entropy*  also  known  as  the  Kullback-Liebler 

information  (1951).  For  m  models  the  entropy  is  m 

E  n .  Lnll .  . 
i»l  1  1 

This  has  its  maximum  value  when  =  1/m  and  becomes  steadily 
smaller  as  the  becomes  unequal,  i.e.  as  discrimination  improves. 
Hence  the  (n*l)th  observation  is  chosen  at  levels  £  which  will 
maximize  the  expected  decrease  in  entropy  from  the  nth  to  the 
(n^l)th  experiment. 

For  two  models  the  resulting  discrimination  criterion  is 
shown  to  be 

D  5  "lnn2n(/'>l*n<pl/p2),)yn U  * /’2tn<p2/pi>dy„n)- 


If  we  can  further  assume  that  the  models  are  locally  linear, 

2  2 

with  deviations  £u  that  are  N(o,o  ),  where  a  is  known,  the 
criterion  becomes,  for  two  models  and  Y^\ 
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where  o?  is  the  approximate  variance  of  For  £  criteria, 

D  is  the  corresponding  expression  summed  over  all  pairs  of 
models. 

One  of  the  examples  worked  is  a  simulated  example,  with 
4  parameters  and  2  factors,  to  distinguish  among  the  first- 
to  fourth-order  reaction  curves.  The  experiment  starts  with  a 
grid  of  4  points  to  estimate  all  4  parameters  needed.  The 
results  proceed  as  in  Table  10.1. 

Table  10.1.  Example  of  discrimination  among  models 


n 

«1 

*2 

"l 

n2 

n3 

n4 

1 

25 

575 

2 

25 

475 

3 

125 

575 

4 

125 

475 

.01 

.43 

.50 

.06 

5 

125 

600 

.00 

.56 

.43 

.01 

6 

125 

600 

.00 

.86 

.13 

.00 

7 

50 

450 

.00 

.97 

.02 

.00 

8 

100 

600 

.00 

1.00 

.00 

.00 

From  the  beginning,  the  competition  is  between  the  second- 
and  third-order  curves,  the  second-order  soon  establishing  itself 
as  correct. 

Box  and  Hill  also  work  an  example  in  which  (i)  all  models 
are  generalizations  of  model  1  and  (ii)  model  1  is  correct  so 
that  all  models  are  correct.  Here  the  entropy  criterion  seems 
to  be  given  an  impossible  task,  but  by  their  largest  n(15),  it 
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is  tending  towards  selection  cf  the  simplest  of  the  correct 
models  -  an  admirable  performance.  However,  in  more  recent 
examples  of  this  situation  in  which  n  was  continued  to  large 
values,  Siddik  (1972)  found  that  the  posterior  probability  of 
the  simplest  correct  model  rose  to  a  value  0.85  to  0.95,  but 
then  fluctuated  erratically  around  that  value.  While  it  still 
can  be  conjectured  that  the  criterion  will  operate  well  in 
practical  experiments,  its  large-sample  performance  needs 
further  study. 

A  succeeding  paper  by  Hill,  Hunter  and  Wichern  (1968) 
recognizes  that  the  best  choice  of  the  £  levels  for  discrimin¬ 
ation  will  not  in  general  be  those  that  give  the  best  parameter 
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estimation  for  the  correct  model,  and  seeks  to  reconcile  these 

conflicting  aims.  If  we  knew  that  model  2  was  "the  correct 

model,  we  would  choose  the  £  to  maximize  Aj  =  | X * X | ^  for  model 

j .  Call  this  value  A •  and  let  A*  denote  the  value  of  the 

3 ,max  3 

estimation  criterion  Aj  for  any  other  choice  of  levels  £. 
Similarly,  let  D  be  the  maximum  expected  decrease  in  entropy, 
and  D  the  decrease  obtained  from  any  other  setting  of  the  £. 

The  criterion  which  these  authors  suggest  for  choosing  the  £ 
levels  is 


m 


C  =  w, D/D  ♦  w, 
1  max  t 


I  n.  A. /A. 
jsl  3n  3  3, max 


The  w^  and  w^  are  weights  which  can  be  changed,  as  the 

sequence  of  runs  proceeds,  to  give  increasing  weight  to  good 
parameter  estimation  when  it  becomes  clearer  that  one  model  is 
being  selected  by  the  discrimination  technique.  For  w^^  they 
suggest,  as  one  possibility, 

={m(l-nbn)/(m-l)}* 


where  n^n  is  the  probability  assigned  to  the  best  model  before 
the  (n*l)th  observation  is  taken.  The  quantity  X  is  a  positive 
power  that  controls  the  rate  of  decrease  of  w^,  the  weight 
assigned  to  the  discrimination  criterion.  Initially,  if  all 
nio  =  1/m,  w^  is  unity  and  all  emphasis  is  given  to  good 
discrimination.  As  ll^n  approaches  1,  so  does  w^,  emphasis 
shifting  to  estimation  for  the  most  likely  model. 
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11.  Model  building 

It  is  more  difficult  to  do  justice  to  the  work  here,  since 
the  strategies  will  change  as  the  accumulated  data  suggest  new 
ideas  to  experimenter  and  statistician. 

As  one  approach.  Box  and  Hunter  (1962,  1965b)  consider  the 
case  where  the  experimenter  has  at  best  a  tentative  model  which 
describes  f(£, 0,t).  If  there  are  k  factors  £  which  the  experi¬ 
menter  can  manipulate,  they  suggest  running  the  reaction,  with 
measurements  of  response  at  certain  fixed  times,  for  a  2 
factorial  or  fractional  factorial  in  the  levels  of  the 
widely  separated  as  far  as  operating  restrictions  permit.  For 
each  combination  of  the  factor  levels  they  estimate  each  0^  and 
do  a  standard  factorial  analysis  into  main  effects  and  inter¬ 
actions  for  each  (L.  There  are  two  objectives  in  this  procedure: 
(i)  if  the  model  is  correct,  the  (L  should  not  change  systemati¬ 
cally  with  time  or  with  the  changes  in  the  levels  of  the 
since  the  0^  should  be  constant,  and  (ii)  the  way  in  which  the 
3^  change  may  enable  the  experimenter  to  specify  a  vague  model 
more  completely,  or  may  suggest  relations  among  the  0^  and  the 
that  make  sense  mechanically. 

In  their  simulated  example  they  use  letters  A,  B,  (the 
initial  concentrations  of  two  reactants),  C  (the  concentrations 
of  a  catalyst),  and  D  (the  temperature),  to  denote  the  factors 
instead  of  our  The  tentative  model  was 

(B)k.  -k-t  -k.t 
E(y)  =  r-  -v1  (e  2  -  e  1  )  . 

12 

I 

I 

I 


t 
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4 

The  initial  experiment  was  a  2  factorial  in  A...D,  measured  at 
5  times.  The  nature  of  the  reaction  suggested  that 


Pi  qx 

kx  =  (A)  (C)  a^e 


-Bj^/T 


k2  =  (A) 


■(C) 


i2  -b2/t 
a2e 


where  T  is  the  absolute  temperature.  (There  are  now  8  parameters 
to  be  estimated).  If  this  suggestion  is  correct,  a  factorial 
analysis  of  Unk^  and  8,nk2»  which  was  then  carried  out,  should 
show  no  effects  of  B  and  no  interactions  involving  A,  C,  and  D. 
The  analysis  confirmed  the  model,  as  did  careful  examination  of 

4 

the  residuals  (y-f)  for  the  2  runs  at  the  5  times  conducted 
initially.  Finally,  the  8  parameters  were  estimated  from  the 
combined  data.  The  second  paper  (Box  and  Hunter,  1965)  gives 
further  discussion  of  the  examination  of  residuals,  contour 
diagrams,  and  plots  of  the  likelihood  function  as  diagnostic 
aids.  The  necessity  for  repeated  interchange  of  ideas  between 
experimenter  and  statistician  is  stressed. 

An  interesting  review  of  approaches  and  problems  in  model¬ 
building  by  M.J.  Box  (1968b)  presents  his  experiences,  with 
discussion  from  the  audience. 


12 .  More  than  one  measured  response 

In  some  chemical  reactions  it  is  possible  to  measure  more 
than  one  response  y  .  (1=1,2, ... ,L)  which  provides  information 

U  X> 

about  some  or  all  of  the  parameters  0^.  The  simplest  example 
quoted  is  the  one-parameter  exponential 
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-ec 


'lu 


=  e 


u 


♦  e 


lu* 


2u 


=  1  -  e 


-65 


u 


+  e 


2u 


where  the  single  factor  5U  represents  time.  Note  that  ylu» 
y2U  do  not  add  to  1  because  of  the  experimental  errors  e^. 
In  a  general  approach  the  model  is 


yiu =  +  Etu  • 


It  has  not  been  considered  realistic  to  assume  e„  ,  e  indepen- 

Jtu  mu 

dent.  Instead,  they  are  given  a  multivariate  normal  distribution 

with  variance-covariance  matrix  a.  . 

Am 

The  first  paper  on  this  problem.  Box  and  Draper  (1965)  did 
not  assume  the  known  in  advance,  and  merely  assigned  a 
'non-informative1  prior  distribution  to  the  o^.  Later  papers 
took  the  more  tractable  problem  in  which  the  o^m  are  assumed 
known,  and  will  be  considered  first. 

With  known  o^,  Draper  and  Hunter  (1966)  assigned  a  Bayesian 
prior  Ild0j  and  followed  the  method  which  led  to  the  |  X  *  X  | 
criterion  for  a  single  response,  as  mentioned  in  Section  7.  It 
helps  to  write 


Am 


N 


u=l 


(12.1) 


They  find  that  the  posterior  probability  is 


,  L  L  . 

p(0|^)  =  c  exp{-  j  L  la1  v^} 

A  m 


(12.2) 


In  this  approach  the  0^  would  be  estimated  by  minimizing 


IIotmv 


Am 


(12.3) 
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that  is,  by  the  natural  extension  of  the  method  of  least 
squares  to  the  case  of  multivariate  normal  deviations. 

By  an  extension  of  the  univariate  method,  the  further 
assumption  that  the  response  functions  are  approximately  linear 
in  the  vicinity  of  the  M.L.  estimates  leads  to  the  criterion: 
choose  the  design  points  to  maximize 

L  L 

A  =  I  E  Z  a  mX'X  |  (12.4) 

1=1  m=l  * 


where  for  given  i,  X^  is  the  Nxp  matrix 


h  --  (xtuj)  *  517 


The  matrices  X^  should  strictly  be  evaluated  at  the  M.L.  0 
after  the  experiment  has  been  completed,  which  cannot  be  done 
when  the  experiment  is  being  planned.  If  no  trials  have  been 
conducted,  the  suggestion  is  to  compute  the  X^  for  initial 
guesses  0q;  if  N  trials  have  been  done  and  a  further  n  are  being 
planned,  use  the  X^  at  the  M.L.  estimates  after  N  trials. 

Illustrations  were  given  for  a  two-response,  one-parameter 
problem  and  by  M.J.  Box  (1970a)  for  a  two-response,  two-parameter 
and  for  a  two-response,  four-parameter  problem.  Draper  and 
Hunter's  interest  was  to  see  how  the  optimum  plan  and  the  value 
of  the  criterion  A  in  (9.4)  varied  with  o^,  °22*  anc*  p* 

Box  considered  whether  replications  of  the  optimum  N=p  plan  were 
still  to  be  recommended.  For  N  a  multiple  of  p,  equal  replica¬ 
tions  of  this  optimum  were  the  best  he  could  find.  For  N  not  a 
multiple  of  p,  the  best  of  the  near-equal  replications  was  not 
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optimal,  but  near  enough  as  a  good  start  in  a  computer  search 
for  anything  better.  M.J.  Box  comments  that  this  search  may 
not  be  worth  the  trouble,  though  further  experience  is  needed. 

Draper  and  Hunter  (1967b)  have  also  extended  to  this  case 
the  single-response  work  reported  in  section  8  for  a  multinormal 
prior.  With  two  responses,  for  instance,  the  criterion  to  be 
maximized  is 

A  =  |a11X.[xi+o22X2X2  +  a12(X^X2+X2X1)+n'1| 


where  (2  is  the  prior  covariance  matrix  of  the  0^. 

Returning  to  the  case  of  a  'non-informative '  prior  that 
leads  to  the  criterion  (12.4),  M.J.  Box  (1970b)  considered  two 


practical  complications.  (1)  The  response  variables  may  not 
be  measured  directly  but  computed  from  other  prime  variables, 
measured  directly,  whose  values  change  as  the  design  points 


change, (2)  The  factor  levels  £  may  be  themselves  subject  to 
error  (a  familiar  problem  in  experimentation).  Consequences 
are  that  the  vary  with  the  design  points  and  that  it 
becomes  less  reasonable  to  think  of  'dependent*  variables  y^u 
and  'independent'  variables  .  Nevertheless,  by  assuming 
that  the  basic  measurements  are  independent,  with  known  vari¬ 


ances,  he  has  developed  a  computer  program  (essentially  involving 
a  known  on.  changing  with  the  design  points).  An  example 
illustrates  the  application  of  this  technique. 

As  mentioned,  Box  and  Draper  (1965)  considered  the  case 
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p(ato)  = 


(L+l) 


which  is  the  multivariate  extension  of  assigning  a  uniform  prior 
to  (log  o )  in  the  univariate  case.  They  find  the  posterior 

-  in 

peel*)  =  c|vtm| 

where  C  is  a  constant.  The  0..  would  then  be  estimated  by 
minimizing  lv£ml*  At  first  sight  this  criterion  seems  rather 
different  from  the  criterion  (11.3):  minimize 

which  emerged  when  the  were  assumed  known.  Box  and  Draper 
show,  however,  that  there  is  a  natural  resemblance.  Let  Vim 
be  the  cofactor  of  vAm.  Now  |vtnJ  can  be  calculated  by  multi¬ 
plying  the  elements  of  v4m  in  any  single  row  or  column  by  their 
cofactors  and  adding.  It  follows  that 

KJ  ■  “t*  v*m  •  m-5> 

Thus  the  weights  o*m  in  (11.3)  are  replaced  by  weights  propor¬ 
tional  to  the  M.L.  estimates  of  the  a*111. 

The  two  simulated  examples  worked  both  involve  only  a 
single  €u  variate  (time).  One  example  has  two  responses,  one 
parameter,  one  has  3  responses,  2  parameters.  It  is  now  necessary 
to  take  N>p  in  order  to  obtain  estimates  of  the  weights  V4m. 

The  values  chosen  were  N-10  for  the  L=?,  p=l  example  and  N=12  for 
the  L=3,  p=2  example,  no  attempt  being  made  to  find  optimum 
values  of  t  (design  points).  From  the  worked  examples  a 
recommendation  is  made  to  plot  the  complete  posterior  functions 
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for  the  0^  obtained  (i)  from  each  individual  response  function 
(ii)  from  each  pair  and  (with  L=3)  from  the  three  combined. 

These  plots  indicate  the  type  and  amount  of  information  supplied 
about  the  respective  0j  by  individual  responses  and  combinations 
of  them.  They  can  also  reveal  deficiencies  in  the  model,  e.g. 
when  the  posterior  from  y^u  has  little  overlap  with  that  for 

y2u* 


13 .  Comments 

From  the  work  on  the  0-1  and  the  continuous-variable 
response,  we  now  have  a  good  grasp  of  the  multiple  desirable 
objectives  in  a  non-linenr  experiment,  a  body  of  techniques 
that  concentrate  on  a  general-purpose  criterion  for  giving  good 
estimates  of  all  the  parameters,  a  method  for  discriminating 
among  specified  models,  and  an  attack  on  the  problem  of  model¬ 
building.  Since  much  of  the  continuous-variable  work  is  recent, 
with  simulated  examples,  I  would  expect  a  period  of  digestion 
by  experimenters  in  industry,  with  feedback  on  features  that 
they  like  and  don't  like  and  additional  properties  desired. 
Further,  as  the  groups  at  Wisconsin  and  I.C.I.  warn  us,  the 
industrial  workers  have  still  harder  problems  awaiting  attack. 

It  is  easy  to  list  much  additional  related  work  that  would 
be  relevant.  To  mention  a  few  areas: 

(1)  Criteria  and  designs  for  the  estimation  of  only  some 
of  the  parameters,  the  others  being  regarded  as 
nuisance  parameters. 

Compromise  designs,  non-optimal  by  any  single 
criterion,  that  cope  with  several  different  objectives. 


(2) 


For  instance,  a  non-sequential  plan  might  deliberately 

start  with  N>p  distinct  points,  in  order  to  provide 

(i)  some  robustness  against  poor  initial  0  ,  (ii) 

either  a  check  on  the  correctness  of  the  model  if  an 

o 

outside  estimate  of  a  is  available,  or  (iii)  an 

2 

internal  estimate  of  a  if  the  model  can  be  assumed 
correct.  Something  to  provide  both  a  check  and  an 
estimate  of  o  might  I  suppose  be  possible  by  a 
development  analogous  to  Tukey's  1  d.f.  for  non¬ 
additivity  under  the  linear  model. 

(3)  Since  the  approach  and  formulas  are  to  a  large  extent 
asymptotic,  checks  by  computer  studies  on  the  small 
sample  performance  of  the  'optimum'  plans  and  formulas. 

(4)  Finally,  and  in  no  invidious  sense,  I  hope  that  more 
people  will  enter  this  field,  with  a  resulting  broader 
range  of  problems  attacked,  of  techniques  developed, 
and  of  viewpoints.  The  discussion  in  the  Royal 
Statistical  Society,  following  Kiefer's  (1959)  pre¬ 
sentation  of  his  work  on  optimum  linear  plans,  revealed 
doubts  about  the  wisdom  of  concentrating  on  optimizing 
any  single  criterion.  Reasons  advanced  were  that 
optimizing  may  require  mathematical  assumptions  or 
restrictions  found  unreasonable  in  many  applications, 
that  the  experimenter's  aims  may  change  when  he  begins 

to  see  some  results,  and  that,  in  sequential  experiments, 
rules  leaving  flexibility  of  judgment  to  the  experimenter 
and  therefore  sounding  vague  to  some  degree  may  be  better 


than  fixed  rules  laid  down  by  a  statistician's 
criterion.  While  this  part  of  the  discussion  was 
somewhat  negativistic  in  tone,  it  suggested  that 
approaches  from  differing  viewpoints  have  an  impor¬ 
tant  role. 

In  a  paper  delivered  at  these  meetings,  Wheeler  (1972) 

maintains  that  the  experimenter  should  seek  a  design  that  will 

be  reasonably  efficient  under  a  variety  of  situations  which  he 

judges  that  he  may  face.  Thus  for  insurance  he  may  want  to  fit 

a  model  more  complex  than  the  one  that  he  hopes  is  correct,  he 

may  fear  some  loss  of  observations  from  accidents,  and  may  want 

at  least  a  specified  number  of  degrees  of  freedom  for  estimation 
2 

of  o  .  To  indicate  the  inefficiency  of  any  proposed  plan, 
relative  to  a  plan  that  concentrates  solely  on  efficiency  of 
estimation,  Wheeler  uses  as  criterion  the  relative  maximum 
variance  of  the  predicted  response  over  the  experimental  region, 
illustrating  how  the  extensive  results  on  optimum  design  for 
linear  models,  in  particular  Wynn  (1970),  provide  computer 
methods  for  meeting  these  goals. 

I  wish  to  thank  P.  Morse,  S.  M.  Siddik,  and  R.  E.  Wheeler 
for  information  about  recent  work. 
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